Indian Winter
Making of a 256 byte intro
 
This article deals with the 256 byte intro "Indian Winter", my contribution to the competition at Demobit 2018, a demoparty held in Bratislava, Slovakia on February 2nd to 4th, 2018. I explain how the intro works and give some hints how to optimize x86 Assembler code (especially regarding size of the executable).
Written by Claus Volko
Vienna, Austria, Europe
Contact: cdvolko (at) gmail (dot) com
Homepage: www.cdvolko.net
The 256 byte intro "Indian Winter" is my contribution to the Demobit 2018 competition. It is based on the code of my older intro "Indian Summer", which I submitted to the demoparty 0a000h back in 2008. I mostly tweaked some parameters to achieve a different visual appearance - in my humble opinion, the result looks very noble. Here I am going to explain how this intro works.
The intro was coded in x86 Assembler, and you can find the source code in TASM syntax along with the intro in the official package available on the Internet. Basically the intro is all about two things:
1. Palette initialization
2. A loop in which oblique lines are drawn and once after a couple of lines, blur is applied
The intro has a size of 150 byte, which makes it considerably smaller than the 256 byte limit and also smaller than its predecessor. It may be possible to further optimize the code without changing the events on screen but probably not much.
The first two lines of the actual code are the following:
        push    0a000h + 400
        pop     es
An experienced coder will immediately see that this code is supposed to set the ES register to the value of the segment where the pixel data is stored in mode 13h. However, why does it set the value to 0a000h + 400? The register with the pixel data, after all, starts at 0a000h!
Well, the reason is simple: because of the way this intro works, it is more convenient to set ES to this value. This is already an optimization of sorts. We are using mode 13h, which has a resolution of 300x200 pixels. However, we want to leave 20 lines on top and 20 lines on bottom empty. If we add 400 to 0a000h, we start at the beginning of line 20, as 400 * 16 / 320 = 20. Since we do not want to write data to lines 0 to 19, we do not have to access them. By setting ES to 0a000h + 400, we can conveniently access the area of the screen that is relevant for us with STOSB (which puts the value of AL to ES:DI) without having to set DI to an initial value different than zero, which would be more expensive than setting it to zero (this can be done cost-effectively using XOR DI,DI).
The next two lines are also more or less trivial:
        mov     al,13h
        int     10h
This initializes screen mode 13h. Note that we do not explicitely set AH to zero but assume that it has already been implicitely set to zero - an assumption that is safe to make according to my experience.
Let's now take a look at the end of the program:
        ; variables
        x_dir   dw ?
        y_dir   dw ?
The reason why I declared these variables at the end and left them undefined is simply to save space - with this way of declaring but not defining variables, we achieve that these variables are nothing but place holders for area in memory after the program code. This helps us get the executable program as small as possible.
The palette initialization code is heavily optimized:
        mov     dx,3c9h
        mov     cl,64
        push    cx
pal_l1:
        xor     ax,ax
        out     dx,al
        pop     ax
        push    ax
        sub     ax,cx
        out     dx,al
        out     dx,al
        loop    pal_l1
        pop     cx
        xor     ax,ax
pal_l2:
        push    ax
        out     dx,al
        mov     al,63
        out     dx,al
        pop     ax
        out     dx,al
        inc     ax
        loop    pal_l2
Probably this code cannot be optimized any further without changing the effect of the code (i. e., the palette).
In the code between the labels "inner_loop" and "loop inner_loop", the oblique lines are drawn. Note that x_dir and y_dir are variables which are added to DI and that x_dir by default is 2 and y_dir by default is 320 (i. e., the default movement is two pixels to the right and one line downwards). When the direction changes, this is achieved conveniently by the NEG instruction. The instruction "div word ptr xscreen + 4" is a trick to avoid that the number 320 appears more often than necessary in the code - as it already appears four byte after the label "xscreen", the instruction tells the CPU to simply take the value from there. This also serves as a size optimization.
Now take a look at this:
        in      al,60h
        dec     ax
        je      ende
This checks for the ESC key. If ESC is pressed, the program will exit. If ESC is pressed, the instruction "in al,60h" will set AX to 1. The most efficient way to test whether a variable is equal to 1 is to decrement it and then check whether it is zero, provided that you can afford that the variable is set to zero in the course of this. This is what is being done here.
The blur part is also heavily optimized and I would be really curious if you managed to get it any smaller.
Well, that's it! I hope you enjoyed this "making of"!
Claus Volko